Overview

Dataset statistics

Number of variables32
Number of observations119390
Missing cells129425
Missing cells (%)3.4%
Duplicate rows31994
Duplicate rows (%)26.8%
Total size in memory106.7 MiB
Average record size in memory936.7 B

Variable types

NUM17
CAT13
BOOL2

Reproduction

Analysis started2020-04-01 06:28:18.886398
Analysis finished2020-04-01 06:33:58.806150
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Dataset has 31994 (26.8%) duplicate rows Duplicates
country has a high cardinality: 177 distinct values High cardinality
reservation_status_date has a high cardinality: 926 distinct values High cardinality
agent has 16340 (13.7%) missing values Missing
company has 112593 (94.3%) missing values Missing
babies is highly skewed (γ1 = 24.64654483) Skewed
previous_cancellations is highly skewed (γ1 = 24.45804872) Skewed
previous_bookings_not_canceled is highly skewed (γ1 = 23.53979995) Skewed
reservation_status_date only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
lead_time has 6345 (5.3%) zeros Zeros
stays_in_weekend_nights has 51998 (43.6%) zeros Zeros
stays_in_week_nights has 7645 (6.4%) zeros Zeros
children has 110796 (92.8%) zeros Zeros
babies has 118473 (99.2%) zeros Zeros
previous_cancellations has 112906 (94.6%) zeros Zeros
previous_bookings_not_canceled has 115770 (97.0%) zeros Zeros
booking_changes has 101314 (84.9%) zeros Zeros
days_in_waiting_list has 115692 (96.9%) zeros Zeros
adr has 1959 (1.6%) zeros Zeros
required_car_parking_spaces has 111974 (93.8%) zeros Zeros
total_of_special_requests has 70318 (58.9%) zeros Zeros

Variables

hotel
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.9 KiB
City Hotel
79330
Resort Hotel
40060
ValueCountFrequency (%) 
City Hotel 79330 66.4%
 
Resort Hotel 40060 33.6%
 

Length

Max length12
Mean length10.67107798
Min length10
ValueCountFrequency (%) 
Lowercase_Letter 8 66.7%
 
Uppercase_Letter 3 25.0%
 
Space_Separator 1 8.3%
 
ValueCountFrequency (%) 
Latin 11 91.7%
 
Common 1 8.3%
 
ValueCountFrequency (%) 
ASCII 12 100.0%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.9 KiB
0
75166
1
44224
ValueCountFrequency (%) 
0 75166 63.0%
 
1 44224 37.0%
 

lead_time
Real number (ℝ≥0)

ZEROS
Distinct count479
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean104.0114164
Minimum0
Maximum737
Zeros6345
Zeros (%)5.3%
Memory size932.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q118
median69
Q3160
95-th percentile320
Maximum737
Range737
Interquartile range (IQR)142

Descriptive statistics

Standard deviation106.863097
Coefficient of variation (CV)1.027416997
Kurtosis1.696448849
Mean104.0114164
Median Absolute Deviation (MAD)84.67197528
Skewness1.346549873
Sum12417923
Variance11419.72151
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000e+00 5.000e-01 1.500e+00 2.500e+00 4.500e+00 ... 6.065e+02 6.240e+02 6.275e+02 6.690e+02 7.370e+02], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 6345 5.3%
 
1 3460 2.9%
 
2 2069 1.7%
 
3 1816 1.5%
 
4 1715 1.4%
 
5 1565 1.3%
 
6 1445 1.2%
 
7 1331 1.1%
 
8 1138 1.0%
 
12 1079 0.9%
 
Other values (469) 97427 81.6%
 
ValueCountFrequency (%) 
0 6345 5.3%
 
1 3460 2.9%
 
2 2069 1.7%
 
3 1816 1.5%
 
4 1715 1.4%
 
ValueCountFrequency (%) 
737 1 < 0.1%
 
709 1 < 0.1%
 
629 17 < 0.1%
 
626 30 < 0.1%
 
622 17 < 0.1%
 
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.9 KiB
2016
56707
2017
40687
2015
21996
ValueCountFrequency (%) 
2016 56707 47.5%
 
2017 40687 34.1%
 
2015 21996 18.4%
 

Length

Max length4
Mean length4
Min length4
ValueCountFrequency (%) 
Decimal_Number 6 100.0%
 
ValueCountFrequency (%) 
Common 6 100.0%
 
ValueCountFrequency (%) 
ASCII 6 100.0%
 
Distinct count12
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.9 KiB
August
13877
July
12661
May
11791
October
 
11160
April
 
11089
Other values (7)
58812
ValueCountFrequency (%) 
August 13877 11.6%
 
July 12661 10.6%
 
May 11791 9.9%
 
October 11160 9.3%
 
April 11089 9.3%
 
June 10939 9.2%
 
September 10508 8.8%
 
March 9794 8.2%
 
February 8068 6.8%
 
November 6794 5.7%
 
Other values (2) 12709 10.6%
 

Length

Max length9
Mean length5.903182846
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 18 69.2%
 
Uppercase_Letter 8 30.8%
 
ValueCountFrequency (%) 
Latin 26 100.0%
 
ValueCountFrequency (%) 
ASCII 26 100.0%
 

arrival_date_week_number
Real number (ℝ≥0)

Distinct count53
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean27.16517296
Minimum1
Maximum53
Zeros0
Zeros (%)0.0%
Memory size932.9 KiB

Quantile statistics

Minimum1
5-th percentile5
Q116
median28
Q338
95-th percentile49
Maximum53
Range52
Interquartile range (IQR)22

Descriptive statistics

Standard deviation13.60513836
Coefficient of variation (CV)0.500830176
Kurtosis-0.9860771763
Mean27.16517296
Median Absolute Deviation (MAD)11.54992462
Skewness-0.01001432604
Sum3243250
Variance185.0997897
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 1.5 3.5 6.5 12.5 ... 49.5 50.5 51.5 52.5 53. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
33 3580 3.0%
 
30 3087 2.6%
 
32 3045 2.6%
 
34 3040 2.5%
 
18 2926 2.5%
 
21 2854 2.4%
 
28 2853 2.4%
 
17 2805 2.3%
 
20 2785 2.3%
 
29 2763 2.3%
 
Other values (43) 89652 75.1%
 
ValueCountFrequency (%) 
1 1047 0.9%
 
2 1218 1.0%
 
3 1319 1.1%
 
4 1487 1.2%
 
5 1387 1.2%
 
ValueCountFrequency (%) 
53 1816 1.5%
 
52 1195 1.0%
 
51 933 0.8%
 
50 1505 1.3%
 
49 1782 1.5%
 

arrival_date_day_of_month
Real number (ℝ≥0)

Distinct count31
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.79824106
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Memory size932.9 KiB

Quantile statistics

Minimum1
5-th percentile2
Q18
median16
Q323
95-th percentile30
Maximum31
Range30
Interquartile range (IQR)15

Descriptive statistics

Standard deviation8.780829471
Coefficient of variation (CV)0.5558105765
Kurtosis-1.187168319
Mean15.79824106
Median Absolute Deviation (MAD)7.578562929
Skewness-0.002000453979
Sum1886152
Variance77.10296619
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 1.5 2.5 4.5 5.5 ... 20.5 23.5 26.5 30.5 31. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
17 4406 3.7%
 
5 4317 3.6%
 
15 4196 3.5%
 
25 4160 3.5%
 
26 4147 3.5%
 
9 4096 3.4%
 
12 4087 3.4%
 
16 4078 3.4%
 
2 4055 3.4%
 
19 4052 3.4%
 
Other values (21) 77796 65.2%
 
ValueCountFrequency (%) 
1 3626 3.0%
 
2 4055 3.4%
 
3 3855 3.2%
 
4 3763 3.2%
 
5 4317 3.6%
 
ValueCountFrequency (%) 
31 2208 1.8%
 
30 3853 3.2%
 
29 3580 3.0%
 
28 3946 3.3%
 
27 3802 3.2%
 

stays_in_weekend_nights
Real number (ℝ≥0)

ZEROS
Distinct count17
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9275986264
Minimum0
Maximum19
Zeros51998
Zeros (%)43.6%
Memory size932.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile2
Maximum19
Range19
Interquartile range (IQR)2

Descriptive statistics

Standard deviation0.9986134946
Coefficient of variation (CV)1.076557755
Kurtosis7.174066064
Mean0.9275986264
Median Absolute Deviation (MAD)0.8079951985
Skewness1.38004645
Sum110746
Variance0.9972289116
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 6.5 7.5 8.5 11. 19. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 51998 43.6%
 
2 33308 27.9%
 
1 30626 25.7%
 
4 1855 1.6%
 
3 1259 1.1%
 
6 153 0.1%
 
5 79 0.1%
 
8 60 0.1%
 
7 19 < 0.1%
 
9 11 < 0.1%
 
Other values (7) 22 < 0.1%
 
ValueCountFrequency (%) 
0 51998 43.6%
 
1 30626 25.7%
 
2 33308 27.9%
 
3 1259 1.1%
 
4 1855 1.6%
 
ValueCountFrequency (%) 
19 1 < 0.1%
 
18 1 < 0.1%
 
16 3 < 0.1%
 
14 2 < 0.1%
 
13 3 < 0.1%
 

stays_in_week_nights
Real number (ℝ≥0)

ZEROS
Distinct count35
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.500301533
Minimum0
Maximum50
Zeros7645
Zeros (%)6.4%
Memory size932.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q33
95-th percentile5
Maximum50
Range50
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.908285615
Coefficient of variation (CV)0.7632221914
Kurtosis24.28455482
Mean2.500301533
Median Absolute Deviation (MAD)1.364286816
Skewness2.862249242
Sum298511
Variance3.641553989
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 18.5 20.5 21.5 25.5 50. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2 33684 28.2%
 
1 30310 25.4%
 
3 22258 18.6%
 
5 11077 9.3%
 
4 9563 8.0%
 
0 7645 6.4%
 
6 1499 1.3%
 
10 1036 0.9%
 
7 1029 0.9%
 
8 656 0.5%
 
Other values (25) 633 0.5%
 
ValueCountFrequency (%) 
0 7645 6.4%
 
1 30310 25.4%
 
2 33684 28.2%
 
3 22258 18.6%
 
4 9563 8.0%
 
ValueCountFrequency (%) 
50 1 < 0.1%
 
42 1 < 0.1%
 
41 1 < 0.1%
 
40 2 < 0.1%
 
35 1 < 0.1%
 

adults
Real number (ℝ≥0)

Distinct count14
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.856403384
Minimum0
Maximum55
Zeros403
Zeros (%)0.3%
Memory size932.9 KiB

Quantile statistics

Minimum0
5-th percentile1
Q12
median2
Q32
95-th percentile3
Maximum55
Range55
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.5792609988
Coefficient of variation (CV)0.3120340137
Kurtosis1352.115116
Mean1.856403384
Median Absolute Deviation (MAD)0.3428851878
Skewness18.31780476
Sum221636
Variance0.3355433048
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 4.5 55. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2 89680 75.1%
 
1 23027 19.3%
 
3 6202 5.2%
 
0 403 0.3%
 
4 62 0.1%
 
26 5 < 0.1%
 
27 2 < 0.1%
 
20 2 < 0.1%
 
5 2 < 0.1%
 
55 1 < 0.1%
 
Other values (4) 4 < 0.1%
 
ValueCountFrequency (%) 
0 403 0.3%
 
1 23027 19.3%
 
2 89680 75.1%
 
3 6202 5.2%
 
4 62 0.1%
 
ValueCountFrequency (%) 
55 1 < 0.1%
 
50 1 < 0.1%
 
40 1 < 0.1%
 
27 2 < 0.1%
 
26 5 < 0.1%
 

children
Real number (ℝ≥0)

ZEROS
Distinct count5
Unique (%)< 0.1%
Missing4
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.1038899033
Minimum0
Maximum10
Zeros110796
Zeros (%)92.8%
Memory size932.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum10
Range10
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.3985614448
Coefficient of variation (CV)3.836382863
Kurtosis18.67369236
Mean0.1038899033
Median Absolute Deviation (MAD)0.192829741
Skewness4.112589542
Sum12403
Variance0.1588512253
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 110796 92.8%
 
1 4861 4.1%
 
2 3652 3.1%
 
3 76 0.1%
 
10 1 < 0.1%
 
(Missing) 4 < 0.1%
 
ValueCountFrequency (%) 
0 110796 92.8%
 
1 4861 4.1%
 
2 3652 3.1%
 
3 76 0.1%
 
10 1 < 0.1%
 
ValueCountFrequency (%) 
10 1 < 0.1%
 
3 76 0.1%
 
2 3652 3.1%
 
1 4861 4.1%
 
0 110796 92.8%
 

babies
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.007948739425
Minimum0
Maximum10
Zeros118473
Zeros (%)99.2%
Memory size932.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum10
Range10
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.0974361913
Coefficient of variation (CV)12.25806837
Kurtosis1633.948235
Mean0.007948739425
Median Absolute Deviation (MAD)0.01577537492
Skewness24.64654483
Sum949
Variance0.009493811375
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 5.5 10. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 118473 99.2%
 
1 900 0.8%
 
2 15 < 0.1%
 
10 1 < 0.1%
 
9 1 < 0.1%
 
ValueCountFrequency (%) 
0 118473 99.2%
 
1 900 0.8%
 
2 15 < 0.1%
 
9 1 < 0.1%
 
10 1 < 0.1%
 
ValueCountFrequency (%) 
10 1 < 0.1%
 
9 1 < 0.1%
 
2 15 < 0.1%
 
1 900 0.8%
 
0 118473 99.2%
 

meal
Categorical

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.9 KiB
BB
92310
HB
 
14463
SC
 
10650
Undefined
 
1169
FB
 
798
ValueCountFrequency (%) 
BB 92310 77.3%
 
HB 14463 12.1%
 
SC 10650 8.9%
 
Undefined 1169 1.0%
 
FB 798 0.7%
 

Length

Max length9
Mean length2.068540079
Min length2
ValueCountFrequency (%) 
Uppercase_Letter 6 54.5%
 
Lowercase_Letter 5 45.5%
 
ValueCountFrequency (%) 
Latin 11 100.0%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

country
Categorical

HIGH CARDINALITY
Distinct count177
Unique (%)0.1%
Missing488
Missing (%)0.4%
Memory size932.9 KiB
PRT
48590
GBR
12129
FRA
10415
ESP
 
8568
DEU
 
7287
Other values (172)
31913
ValueCountFrequency (%) 
PRT 48590 40.7%
 
GBR 12129 10.2%
 
FRA 10415 8.7%
 
ESP 8568 7.2%
 
DEU 7287 6.1%
 
ITA 3766 3.2%
 
IRL 3375 2.8%
 
BEL 2342 2.0%
 
BRA 2224 1.9%
 
NLD 2104 1.8%
 
Other values (167) 18102 15.2%
 

Length

Max length3
Mean length2.98928721
Min length2
ValueCountFrequency (%) 
Uppercase_Letter 26 92.9%
 
Lowercase_Letter 2 7.1%
 
ValueCountFrequency (%) 
Latin 28 100.0%
 
ValueCountFrequency (%) 
ASCII 28 100.0%
 

market_segment
Categorical

Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.9 KiB
Online TA
56477
Offline TA/TO
24219
Groups
19811
Direct
12606
Corporate
 
5295
Other values (3)
 
982
ValueCountFrequency (%) 
Online TA 56477 47.3%
 
Offline TA/TO 24219 20.3%
 
Groups 19811 16.6%
 
Direct 12606 10.6%
 
Corporate 5295 4.4%
 
Complementary 743 0.6%
 
Aviation 237 0.2%
 
Undefined 2 < 0.1%
 

Length

Max length13
Mean length9.01976715
Min length6
ValueCountFrequency (%) 
Lowercase_Letter 17 65.4%
 
Uppercase_Letter 7 26.9%
 
Space_Separator 1 3.8%
 
Other_Punctuation 1 3.8%
 
ValueCountFrequency (%) 
Latin 24 92.3%
 
Common 2 7.7%
 
ValueCountFrequency (%) 
ASCII 26 100.0%
 
Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.9 KiB
TA/TO
97870
Direct
 
14645
Corporate
 
6677
GDS
 
193
Undefined
 
5
ValueCountFrequency (%) 
TA/TO 97870 82.0%
 
Direct 14645 12.3%
 
Corporate 6677 5.6%
 
GDS 193 0.2%
 
Undefined 5 < 0.1%
 

Length

Max length9
Mean length5.343303459
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 11 55.0%
 
Uppercase_Letter 8 40.0%
 
Other_Punctuation 1 5.0%
 
ValueCountFrequency (%) 
Latin 19 95.0%
 
Common 1 5.0%
 
ValueCountFrequency (%) 
ASCII 20 100.0%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.9 KiB
0
115580
1
 
3810
ValueCountFrequency (%) 
0 115580 96.8%
 
1 3810 3.2%
 

previous_cancellations
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count15
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.08711784907
Minimum0
Maximum26
Zeros112906
Zeros (%)94.6%
Memory size932.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum26
Range26
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8443363842
Coefficient of variation (CV)9.691887405
Kurtosis674.0736926
Mean0.08711784907
Median Absolute Deviation (MAD)0.1647730608
Skewness24.45804872
Sum10401
Variance0.7129039296
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 5.5 20. 22.5 25.5 26. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 112906 94.6%
 
1 6051 5.1%
 
2 116 0.1%
 
3 65 0.1%
 
24 48 < 0.1%
 
11 35 < 0.1%
 
4 31 < 0.1%
 
26 26 < 0.1%
 
25 25 < 0.1%
 
6 22 < 0.1%
 
Other values (5) 65 0.1%
 
ValueCountFrequency (%) 
0 112906 94.6%
 
1 6051 5.1%
 
2 116 0.1%
 
3 65 0.1%
 
4 31 < 0.1%
 
ValueCountFrequency (%) 
26 26 < 0.1%
 
25 25 < 0.1%
 
24 48 < 0.1%
 
21 1 < 0.1%
 
19 19 < 0.1%
 

previous_bookings_not_canceled
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count73
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1370969093
Minimum0
Maximum72
Zeros115770
Zeros (%)97.0%
Memory size932.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum72
Range72
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.497436848
Coefficient of variation (CV)10.92246977
Kurtosis767.2452097
Mean0.1370969093
Median Absolute Deviation (MAD)0.2658800434
Skewness23.53979995
Sum16368
Variance2.242317113
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 10.5 14.5 25.5 30.5 72. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 115770 97.0%
 
1 1542 1.3%
 
2 580 0.5%
 
3 333 0.3%
 
4 229 0.2%
 
5 181 0.2%
 
6 115 0.1%
 
7 88 0.1%
 
8 70 0.1%
 
9 60 0.1%
 
Other values (63) 422 0.4%
 
ValueCountFrequency (%) 
0 115770 97.0%
 
1 1542 1.3%
 
2 580 0.5%
 
3 333 0.3%
 
4 229 0.2%
 
ValueCountFrequency (%) 
72 1 < 0.1%
 
71 1 < 0.1%
 
70 1 < 0.1%
 
69 1 < 0.1%
 
68 1 < 0.1%
 
Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.9 KiB
A
85994
D
19201
E
 
6535
F
 
2897
G
 
2094
Other values (5)
 
2669
ValueCountFrequency (%) 
A 85994 72.0%
 
D 19201 16.1%
 
E 6535 5.5%
 
F 2897 2.4%
 
G 2094 1.8%
 
B 1118 0.9%
 
C 932 0.8%
 
H 601 0.5%
 
P 12 < 0.1%
 
L 6 < 0.1%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 10 100.0%
 
ValueCountFrequency (%) 
Latin 10 100.0%
 
ValueCountFrequency (%) 
ASCII 10 100.0%
 
Distinct count12
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.9 KiB
A
74053
D
25322
E
 
7806
F
 
3751
G
 
2553
Other values (7)
 
5905
ValueCountFrequency (%) 
A 74053 62.0%
 
D 25322 21.2%
 
E 7806 6.5%
 
F 3751 3.1%
 
G 2553 2.1%
 
C 2375 2.0%
 
B 2163 1.8%
 
H 712 0.6%
 
I 363 0.3%
 
K 279 0.2%
 
Other values (2) 13 < 0.1%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 12 100.0%
 
ValueCountFrequency (%) 
Latin 12 100.0%
 
ValueCountFrequency (%) 
ASCII 12 100.0%
 

booking_changes
Real number (ℝ≥0)

ZEROS
Distinct count21
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2211240472
Minimum0
Maximum21
Zeros101314
Zeros (%)84.9%
Memory size932.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum21
Range21
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.6523055727
Coefficient of variation (CV)2.949953118
Kurtosis79.39360467
Mean0.2211240472
Median Absolute Deviation (MAD)0.3752904217
Skewness6.000270054
Sum26400
Variance0.4255025601
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 5.5 6.5 8.5 15.5 21. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 101314 84.9%
 
1 12701 10.6%
 
2 3805 3.2%
 
3 927 0.8%
 
4 376 0.3%
 
5 118 0.1%
 
6 63 0.1%
 
7 31 < 0.1%
 
8 17 < 0.1%
 
9 8 < 0.1%
 
Other values (11) 30 < 0.1%
 
ValueCountFrequency (%) 
0 101314 84.9%
 
1 12701 10.6%
 
2 3805 3.2%
 
3 927 0.8%
 
4 376 0.3%
 
ValueCountFrequency (%) 
21 1 < 0.1%
 
20 1 < 0.1%
 
18 1 < 0.1%
 
17 2 < 0.1%
 
16 2 < 0.1%
 

deposit_type
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.9 KiB
No Deposit
104641
Non Refund
 
14587
Refundable
 
162
ValueCountFrequency (%) 
No Deposit 104641 87.6%
 
Non Refund 14587 12.2%
 
Refundable 162 0.1%
 

Length

Max length10
Mean length10
Min length10
ValueCountFrequency (%) 
Lowercase_Letter 13 76.5%
 
Uppercase_Letter 3 17.6%
 
Space_Separator 1 5.9%
 
ValueCountFrequency (%) 
Latin 16 94.1%
 
Common 1 5.9%
 
ValueCountFrequency (%) 
ASCII 17 100.0%
 

agent
Real number (ℝ≥0)

MISSING
Distinct count333
Unique (%)0.3%
Missing16340
Missing (%)13.7%
Infinite0
Infinite (%)0.0%
Mean86.69338185
Minimum1
Maximum535
Zeros0
Zeros (%)0.0%
Memory size932.9 KiB

Quantile statistics

Minimum1
5-th percentile1
Q19
median14
Q3229
95-th percentile250
Maximum535
Range534
Interquartile range (IQR)220

Descriptive statistics

Standard deviation110.7745476
Coefficient of variation (CV)1.277773981
Kurtosis-0.007179564938
Mean86.69338185
Median Absolute Deviation (MAD)97.04657859
Skewness1.089385636
Sum8933753
Variance12271.00041
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
9 31961 26.8%
 
240 13922 11.7%
 
1 7191 6.0%
 
14 3640 3.0%
 
7 3539 3.0%
 
6 3290 2.8%
 
250 2870 2.4%
 
241 1721 1.4%
 
28 1666 1.4%
 
8 1514 1.3%
 
Other values (323) 31736 26.6%
 
(Missing) 16340 13.7%
 
ValueCountFrequency (%) 
1 7191 6.0%
 
2 162 0.1%
 
3 1336 1.1%
 
4 47 < 0.1%
 
5 330 0.3%
 
ValueCountFrequency (%) 
535 3 < 0.1%
 
531 68 0.1%
 
527 35 < 0.1%
 
526 10 < 0.1%
 
510 2 < 0.1%
 

company
Real number (ℝ≥0)

MISSING
Distinct count352
Unique (%)5.2%
Missing112593
Missing (%)94.3%
Infinite0
Infinite (%)0.0%
Mean189.2667353
Minimum6
Maximum543
Zeros0
Zeros (%)0.0%
Memory size932.9 KiB

Quantile statistics

Minimum6
5-th percentile40
Q162
median179
Q3270
95-th percentile435
Maximum543
Range537
Interquartile range (IQR)208

Descriptive statistics

Standard deviation131.6550146
Coefficient of variation (CV)0.6956056721
Kurtosis-0.4907952103
Mean189.2667353
Median Absolute Deviation (MAD)109.1110502
Skewness0.6015996673
Sum1286446
Variance17333.04288
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
40 927 0.8%
 
223 784 0.7%
 
67 267 0.2%
 
45 250 0.2%
 
153 215 0.2%
 
174 149 0.1%
 
219 141 0.1%
 
281 138 0.1%
 
154 133 0.1%
 
405 119 0.1%
 
Other values (342) 3674 3.1%
 
(Missing) 112593 94.3%
 
ValueCountFrequency (%) 
6 1 < 0.1%
 
8 1 < 0.1%
 
9 37 < 0.1%
 
10 1 < 0.1%
 
11 1 < 0.1%
 
ValueCountFrequency (%) 
543 2 < 0.1%
 
541 1 < 0.1%
 
539 2 < 0.1%
 
534 2 < 0.1%
 
531 1 < 0.1%
 

days_in_waiting_list
Real number (ℝ≥0)

ZEROS
Distinct count128
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.321149175
Minimum0
Maximum391
Zeros115692
Zeros (%)96.9%
Memory size932.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum391
Range391
Interquartile range (IQR)0

Descriptive statistics

Standard deviation17.59472088
Coefficient of variation (CV)7.580176694
Kurtosis186.7930696
Mean2.321149175
Median Absolute Deviation (MAD)4.49879973
Skewness11.94435345
Sum277122
Variance309.5742028
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 2.5 3.5 4.5 ... 219. 223.5 247.5 385. 391. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 115692 96.9%
 
39 227 0.2%
 
58 164 0.1%
 
44 141 0.1%
 
31 127 0.1%
 
35 96 0.1%
 
46 94 0.1%
 
69 89 0.1%
 
63 83 0.1%
 
50 80 0.1%
 
Other values (118) 2597 2.2%
 
ValueCountFrequency (%) 
0 115692 96.9%
 
1 12 < 0.1%
 
2 5 < 0.1%
 
3 59 < 0.1%
 
4 25 < 0.1%
 
ValueCountFrequency (%) 
391 45 < 0.1%
 
379 15 < 0.1%
 
330 15 < 0.1%
 
259 10 < 0.1%
 
236 35 < 0.1%
 

customer_type
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.9 KiB
Transient
89613
Transient-Party
25124
Contract
 
4076
Group
 
577
ValueCountFrequency (%) 
Transient 89613 75.1%
 
Transient-Party 25124 21.0%
 
Contract 4076 3.4%
 
Group 577 0.5%
 

Length

Max length15
Mean length10.20914649
Min length5
ValueCountFrequency (%) 
Lowercase_Letter 12 70.6%
 
Uppercase_Letter 4 23.5%
 
Dash_Punctuation 1 5.9%
 
ValueCountFrequency (%) 
Latin 16 94.1%
 
Common 1 5.9%
 
ValueCountFrequency (%) 
ASCII 17 100.0%
 

adr
Real number (ℝ)

ZEROS
Distinct count8879
Unique (%)7.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean101.8311215
Minimum-6.38
Maximum5400
Zeros1959
Zeros (%)1.6%
Memory size932.9 KiB

Quantile statistics

Minimum-6.38
5-th percentile38.4
Q169.29
median94.575
Q3126
95-th percentile193.5
Maximum5400
Range5406.38
Interquartile range (IQR)56.71

Descriptive statistics

Standard deviation50.53579029
Coefficient of variation (CV)0.4962705853
Kurtosis1013.189851
Mean101.8311215
Median Absolute Deviation (MAD)36.38052772
Skewness10.53021398
Sum12157617.6
Variance2553.8661
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-6.3800e+00 -3.1900e+00 1.3000e-01 5.6250e+00 6.2000e+00 ... 3.1625e+02 3.4350e+02 3.9469e+02 5.0900e+02 5.4000e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
62 3754 3.1%
 
75 2715 2.3%
 
90 2473 2.1%
 
65 2418 2.0%
 
0 1959 1.6%
 
80 1889 1.6%
 
95 1661 1.4%
 
120 1607 1.3%
 
100 1573 1.3%
 
85 1538 1.3%
 
Other values (8869) 97803 81.9%
 
ValueCountFrequency (%) 
-6.38 1 < 0.1%
 
0 1959 1.6%
 
0.26 1 < 0.1%
 
0.5 1 < 0.1%
 
1 15 < 0.1%
 
ValueCountFrequency (%) 
5400 1 < 0.1%
 
510 1 < 0.1%
 
508 1 < 0.1%
 
451.5 1 < 0.1%
 
450 1 < 0.1%
 

required_car_parking_spaces
Real number (ℝ≥0)

ZEROS
Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.06251779881
Minimum0
Maximum8
Zeros111974
Zeros (%)93.8%
Memory size932.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2452911475
Coefficient of variation (CV)3.92354101
Kurtosis29.99805617
Mean0.06251779881
Median Absolute Deviation (MAD)0.1172689171
Skewness4.163233238
Sum7464
Variance0.06016774703
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.5 1.5 2.5 8. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 111974 93.8%
 
1 7383 6.2%
 
2 28 < 0.1%
 
3 3 < 0.1%
 
8 2 < 0.1%
 
ValueCountFrequency (%) 
0 111974 93.8%
 
1 7383 6.2%
 
2 28 < 0.1%
 
3 3 < 0.1%
 
8 2 < 0.1%
 
ValueCountFrequency (%) 
8 2 < 0.1%
 
3 3 < 0.1%
 
2 28 < 0.1%
 
1 7383 6.2%
 
0 111974 93.8%
 

total_of_special_requests
Real number (ℝ≥0)

ZEROS
Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5713627607
Minimum0
Maximum5
Zeros70318
Zeros (%)58.9%
Memory size932.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum5
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7927984228
Coefficient of variation (CV)1.387557043
Kurtosis1.492564811
Mean0.5713627607
Median Absolute Deviation (MAD)0.6730393937
Skewness1.349189377
Sum68215
Variance0.6285293392
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 1.5 2.5 3.5 4.5 5. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 70318 58.9%
 
1 33226 27.8%
 
2 12969 10.9%
 
3 2497 2.1%
 
4 340 0.3%
 
5 40 < 0.1%
 
ValueCountFrequency (%) 
0 70318 58.9%
 
1 33226 27.8%
 
2 12969 10.9%
 
3 2497 2.1%
 
4 340 0.3%
 
ValueCountFrequency (%) 
5 40 < 0.1%
 
4 340 0.3%
 
3 2497 2.1%
 
2 12969 10.9%
 
1 33226 27.8%
 
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.9 KiB
Check-Out
75166
Canceled
43017
No-Show
 
1207
ValueCountFrequency (%) 
Check-Out 75166 63.0%
 
Canceled 43017 36.0%
 
No-Show 1207 1.0%
 

Length

Max length9
Mean length8.619473993
Min length7
ValueCountFrequency (%) 
Lowercase_Letter 12 70.6%
 
Uppercase_Letter 4 23.5%
 
Dash_Punctuation 1 5.9%
 
ValueCountFrequency (%) 
Latin 16 94.1%
 
Common 1 5.9%
 
ValueCountFrequency (%) 
ASCII 17 100.0%
 

reservation_status_date
Categorical

HIGH CARDINALITY
TYPE DATE
Distinct count926
Unique (%)0.8%
Missing0
Missing (%)0.0%
Memory size932.9 KiB
2015-10-21
 
1461
2015-07-06
 
805
2016-11-25
 
790
2015-01-01
 
763
2016-01-18
 
625
Other values (921)
114946
ValueCountFrequency (%) 
2015-10-21 1461 1.2%
 
2015-07-06 805 0.7%
 
2016-11-25 790 0.7%
 
2015-01-01 763 0.6%
 
2016-01-18 625 0.5%
 
2015-07-02 469 0.4%
 
2016-12-07 450 0.4%
 
2015-12-18 423 0.4%
 
2016-02-09 412 0.3%
 
2016-04-04 382 0.3%
 
Other values (916) 112810 94.5%
 

Length

Max length10
Mean length10
Min length10
ValueCountFrequency (%) 
Decimal_Number 10 90.9%
 
Dash_Punctuation 1 9.1%
 
ValueCountFrequency (%) 
Common 11 100.0%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

hotelis_canceledlead_timearrival_date_yeararrival_date_montharrival_date_week_numberarrival_date_day_of_monthstays_in_weekend_nightsstays_in_week_nightsadultschildrenbabiesmealcountrymarket_segmentdistribution_channelis_repeated_guestprevious_cancellationsprevious_bookings_not_canceledreserved_room_typeassigned_room_typebooking_changesdeposit_typeagentcompanydays_in_waiting_listcustomer_typeadrrequired_car_parking_spacestotal_of_special_requestsreservation_statusreservation_status_date
0Resort Hotel03422015July2710020.00BBPRTDirectDirect000CC3No DepositNaNNaN0Transient0.000Check-Out2015-07-01
1Resort Hotel07372015July2710020.00BBPRTDirectDirect000CC4No DepositNaNNaN0Transient0.000Check-Out2015-07-01
2Resort Hotel072015July2710110.00BBGBRDirectDirect000AC0No DepositNaNNaN0Transient75.000Check-Out2015-07-02
3Resort Hotel0132015July2710110.00BBGBRCorporateCorporate000AA0No Deposit304.0NaN0Transient75.000Check-Out2015-07-02
4Resort Hotel0142015July2710220.00BBGBROnline TATA/TO000AA0No Deposit240.0NaN0Transient98.001Check-Out2015-07-03
5Resort Hotel0142015July2710220.00BBGBROnline TATA/TO000AA0No Deposit240.0NaN0Transient98.001Check-Out2015-07-03
6Resort Hotel002015July2710220.00BBPRTDirectDirect000CC0No DepositNaNNaN0Transient107.000Check-Out2015-07-03
7Resort Hotel092015July2710220.00FBPRTDirectDirect000CC0No Deposit303.0NaN0Transient103.001Check-Out2015-07-03
8Resort Hotel1852015July2710320.00BBPRTOnline TATA/TO000AA0No Deposit240.0NaN0Transient82.001Canceled2015-05-06
9Resort Hotel1752015July2710320.00HBPRTOffline TA/TOTA/TO000DD0No Deposit15.0NaN0Transient105.500Canceled2015-04-22

Last rows

hotelis_canceledlead_timearrival_date_yeararrival_date_montharrival_date_week_numberarrival_date_day_of_monthstays_in_weekend_nightsstays_in_week_nightsadultschildrenbabiesmealcountrymarket_segmentdistribution_channelis_repeated_guestprevious_cancellationsprevious_bookings_not_canceledreserved_room_typeassigned_room_typebooking_changesdeposit_typeagentcompanydays_in_waiting_listcustomer_typeadrrequired_car_parking_spacestotal_of_special_requestsreservation_statusreservation_status_date
119380City Hotel0442017August35311320.00SCDEUOnline TATA/TO000AA0No Deposit9.0NaN0Transient140.7501Check-Out2017-09-04
119381City Hotel01882017August35312320.00BBDEUDirectDirect000AA0No Deposit14.0NaN0Transient99.0000Check-Out2017-09-05
119382City Hotel01352017August35302430.00BBJPNOnline TATA/TO000GG0No Deposit7.0NaN0Transient209.0000Check-Out2017-09-05
119383City Hotel01642017August35312420.00BBDEUOffline TA/TOTA/TO000AA0No Deposit42.0NaN0Transient87.6000Check-Out2017-09-06
119384City Hotel0212017August35302520.00BBBELOffline TA/TOTA/TO000AA0No Deposit394.0NaN0Transient96.1402Check-Out2017-09-06
119385City Hotel0232017August35302520.00BBBELOffline TA/TOTA/TO000AA0No Deposit394.0NaN0Transient96.1400Check-Out2017-09-06
119386City Hotel01022017August35312530.00BBFRAOnline TATA/TO000EE0No Deposit9.0NaN0Transient225.4302Check-Out2017-09-07
119387City Hotel0342017August35312520.00BBDEUOnline TATA/TO000DD0No Deposit9.0NaN0Transient157.7104Check-Out2017-09-07
119388City Hotel01092017August35312520.00BBGBROnline TATA/TO000AA0No Deposit89.0NaN0Transient104.4000Check-Out2017-09-07
119389City Hotel02052017August35292720.00HBDEUOnline TATA/TO000AA0No Deposit9.0NaN0Transient151.2002Check-Out2017-09-07